{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Part 2, Topic 1: CPA Attack on 32bit AES (MAIN)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---\n",
    "NOTE: This lab references some (commercial) training material on [ChipWhisperer.io](https://www.ChipWhisperer.io). You can freely execute and use the lab per the open-source license (including using it in your own courses if you distribute similarly), but you must maintain notice about this source location. Consider joining our training course to enjoy the full experience.\n",
    "\n",
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**SUMMARY:** *So far, we've been focusing on a single implementation of AES, TINYAES128C (or AVRCRYPTOLIB, if you're on XMEGA). TINYAES128C, which is designed to run on a variety of microcontrollers, doesn't make any implementation specific optimizations. In this lab, we'll look at how we can break a 32-bit optimized version of AES using a CPA attack.*\n",
    "\n",
    "**LEARNING OUTCOMES:**\n",
    "\n",
    "* Understanding how AES can be optimized on 32-bit platforms.\n",
    "* Attacking an optimized version of AES using CPA"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Optimizing AES\n",
    "\n",
    "A 32-bit machine can operate on 32-bit words, so it seems wasteful to use the same 8-bit operations. For example, if we look at the SBox operation:\n",
    "\n",
    "$\n",
    "b = sbox(state) = sbox(\\left[ \\begin{array}\n",
    "& S0 & S4 & S8 & S12 \\\\\n",
    "S1 & S5 & S9 & S13 \\\\\n",
    "S2 & S6 & S10 & S14 \\\\\n",
    "S3 & S7 & S11 & S15\n",
    "\\end{array} \\right]) = \\left[ \\begin{array}\n",
    "& S0 & S4 & S8 & S12 \\\\\n",
    "S5 & S9 & S13 & S1 \\\\\n",
    "S10 & S14 & S2 & S6 \\\\\n",
    "S15 & S3 & S7 & S11\n",
    "\\end{array} \\right]\n",
    "$\n",
    "\n",
    "we could consider each row as a 32-bit number and do three bitwise rotates instead of moving a bunch of stuff around in memory. Even better, we can speed up AES considerably by generating 32-bit lookup tables, called T-Tables, as was described in the book [The Design of Rijndael](http://www.springer.com/gp/book/9783540425809) which was published by the authors of AES.\n",
    "\n",
    "In order to take full advantage of our 32 bit machine, we can examine a typical round of AES. With the exception of the final round, each round looks like:\n",
    "\n",
    "$\\text{a = Round Input}$\n",
    "\n",
    "$\\text{b = SubBytes(a)}$\n",
    "\n",
    "$\\text{c = ShiftRows(b)}$\n",
    "\n",
    "$\\text{d = MixColumns(c)}$\n",
    "\n",
    "$\\text{a' = AddRoundKey(d) = Round Output}$\n",
    "\n",
    "We'll leave AddRoundKey the way it is. The other operations are:\n",
    "\n",
    "$b_{i,j} = \\text{sbox}[a_{i,j}]$\n",
    "\n",
    "$\\left[ \\begin{array} { c } { c _ { 0 , j } } \\\\ { c _ { 1 , j } } \\\\ { c _ { 2 , j } } \\\\ { c _ { 3 , j } } \\end{array} \\right] = \\left[ \\begin{array} { l } { b _ { 0 , j + 0 } } \\\\ { b _ { 1 , j + 1 } } \\\\ { b _ { 2 , j + 2 } } \\\\ { b _ { 3 , j + 3 } } \\end{array} \\right]$\n",
    "\n",
    "$\\left[ \\begin{array} { l } { d _ { 0 , j } } \\\\ { d _ { 1 , j } } \\\\ { d _ { 2 , j } } \\\\ { d _ { 3 , j } } \\end{array} \\right] = \\left[ \\begin{array} { l l l l } { 02 } & { 03 } & { 01 } & { 01 } \\\\ { 01 } & { 02 } & { 03 } & { 01 } \\\\ { 01 } & { 01 } & { 02 } & { 03 } \\\\ { 03 } & { 01 } & { 01 } & { 02 } \\end{array} \\right] \\times \\left[ \\begin{array} { c } { c _ { 0 , j } } \\\\ { c _ { 1 , j } } \\\\ { c _ { 2 , j } } \\\\ { c _ { 3 , j } } \\end{array} \\right]$\n",
    "\n",
    "Note that the ShiftRows operation $b_{i, j+c}$ is a cyclic shift and the matrix multiplcation in MixColumns denotes the xtime operation in GF($2^8$).\n",
    "\n",
    "It's possible to combine all three of these operations into a single line. We can write 4 bytes of $d$ as the linear combination of four different 4 byte vectors:\n",
    "\n",
    "$\\left[ \\begin{array} { l } { d _ { 0 , j } } \\\\ { d _ { 1 , j } } \\\\ { d _ { 2 , j } } \\\\ { d _ { 3 , j } } \\end{array} \\right] = \\left[ \\begin{array} { l } { 02 } \\\\ { 01 } \\\\ { 01 } \\\\ { 03 } \\end{array} \\right] \\operatorname { sbox } \\left[ a _ { 0 , j + 0 } \\right] \\oplus \\left[ \\begin{array} { l } { 03 } \\\\ { 02 } \\\\ { 01 } \\\\ { 01 } \\end{array} \\right] \\operatorname { sbox } \\left[ a _ { 1 , j + 1 } \\right] \\oplus \\left[ \\begin{array} { c } { 01 } \\\\ { 03 } \\\\ { 02 } \\\\ { 01 } \\end{array} \\right] \\operatorname { sbox } \\left[ a _ { 2 , j + 2 } \\right] \\oplus \\left[ \\begin{array} { c } { 01 } \\\\ { 01 } \\\\ { 03 } \\\\ { 02 } \\end{array} \\right] \\operatorname { sbox } \\left[ a _ { 3 , j + 3 } \\right]$\n",
    "\n",
    "Now, for each of these four components, we can tabulate the outputs for every possible 8-bit input:\n",
    "\n",
    "$T _ { 0 } [ a ] = \\left[ \\begin{array} { l l } { 02 \\times \\operatorname { sbox } [ a ] } \\\\ { 01 \\times \\operatorname { sbox } [ a ] } \\\\ { 01 \\times \\operatorname { sbox } [ a ] } \\\\ { 03 \\times \\operatorname { sbox } [ a ] } \\end{array} \\right]$\n",
    "\n",
    "$T _ { 1 } [ a ] = \\left[ \\begin{array} { l } { 03 \\times \\operatorname { sbox } [ a ] } \\\\ { 02 \\times \\operatorname { sbox } [ a ] } \\\\ { 01 \\times \\operatorname { sbox } [ a ] } \\\\ { 01 \\times \\operatorname { sbox } [ a ] } \\end{array} \\right]$\n",
    "\n",
    "$T _ { 2 } [ a ] = \\left[ \\begin{array} { l l } { 01 \\times \\operatorname { sbox } [ a ] } \\\\ { 03 \\times \\operatorname { sbox } [ a ] } \\\\ { 02 \\times \\operatorname { sbox } [ a ] } \\\\ { 01 \\times \\operatorname { sbox } [ a ] } \\end{array} \\right]$\n",
    "\n",
    "$T _ { 3 } [ a ] = \\left[ \\begin{array} { l l } { 01 \\times \\operatorname { sbox } [ a ] } \\\\ { 01 \\times \\operatorname { sbox } [ a ] } \\\\ { 03 \\times \\operatorname { sbox } [ a ] } \\\\ { 02 \\times \\operatorname { sbox } [ a ] } \\end{array} \\right]$\n",
    "\n",
    "These tables have 2^8 different 32-bit entries, so together the tables take up 4 kB. Finally, we can quickly compute one round of AES by calculating\n",
    "\n",
    "$\\left[ \\begin{array} { l } { d _ { 0 , j } } \\\\ { d _ { 1 , j } } \\\\ { d _ { 2 , j } } \\\\ { d _ { 3 , j } } \\end{array} \\right] = T _ { 0 } \\left[ a _ { 0 } , j + 0 \\right] \\oplus T _ { 1 } \\left[ a _ { 1 } , j + 1 \\right] \\oplus T _ { 2 } \\left[ a _ { 2 } , j + 2 \\right] \\oplus T _ { 3 } \\left[ a _ { 3 } , j + 3 \\right]$\n",
    "\n",
    "All together, with AddRoundKey at the end, a single round now takes 16 table lookups and 16 32-bit XOR operations. This arrangement is much more efficient than the traditional 8-bit implementation. There are a few more tradeoffs that can be made: for instance, the tables only differ by 8-bit shifts, so it's also possible to store only 1 kB of lookup tables at the expense of a few rotate operations.\n",
    "\n",
    "While the TINYAES128C library we've been using doesn't make this optimization, another library included with ChipWhisperer called MBEDTLS does."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "SCOPETYPE = 'OPENADC'\n",
    "PLATFORM = 'CWLITEARM'\n",
    "VERSION = 'HARDWARE'\n",
    "SS_VER = 'SS_VER_1_1'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "CRYPTO_TARGET = 'MBEDTLS' # overwrite auto inserted CRYPTO_TARGET"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "if VERSION == 'HARDWARE':\n",
    "    %run \"Lab 2_1 - CPA on 32bit AES (HARDWARE).ipynb\"\n",
    "elif VERSION == 'SIMULATED':\n",
    "    %run \"Lab 2_1 - CPA on 32bit AES (SIMULATED).ipynb\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If we plot the AES power trace:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "cw.plot(project.waves[0])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You probably can't even pick out the different AES rounds anymore (whereas it was pretty obvious on TINYAES128C). MBED is also way faster - we only got part way into round 2 with 5000 samples of TINYAES, but with MBED we can finish the entire encryption in less than 5000 samples! Two questions we need to answer now are:\n",
    "\n",
    "1. Is it possible for us to break this AES implementation?\n",
    "1. If so, what sort of leakage model do we need?\n",
    "\n",
    "As it turns out, the answers are:\n",
    "\n",
    "1. Yes!\n",
    "1. We can continue to use the same leakage model - the SBox output\n",
    "\n",
    "This might come as a surprise, but it's true! Two of the t_table lookups are just the sbox[key^plaintext] that we used before. Try the analysis for yourself now and verify that this is correct:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import chipwhisperer.analyzer as cwa\n",
    "#pick right leakage model for your attack\n",
    "leak_model = cwa.leakage_models.sbox_output\n",
    "attack = cwa.cpa(project, leak_model)\n",
    "results = attack.run(cwa.get_jupyter_callback(attack))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Improving the Model\n",
    "\n",
    "While this model works alright for mbedtls, you probably wouldn't be surprised if it wasn't the best model to attack with. Instead, we can attack the full T-Tables. Returning again to the T-Tables:\n",
    "\n",
    "$T _ { 0 } [ a ] = \\left[ \\begin{array} { l l } { 02 \\times \\operatorname { sbox } [ a ] } \\\\ { 01 \\times \\operatorname { sbox } [ a ] } \\\\ { 01 \\times \\operatorname { sbox } [ a ] } \\\\ { 03 \\times \\operatorname { sbox } [ a ] } \\end{array} \\right]$\n",
    "\n",
    "$T _ { 1 } [ a ] = \\left[ \\begin{array} { l } { 03 \\times \\operatorname { sbox } [ a ] } \\\\ { 02 \\times \\operatorname { sbox } [ a ] } \\\\ { 01 \\times \\operatorname { sbox } [ a ] } \\\\ { 01 \\times \\operatorname { sbox } [ a ] } \\end{array} \\right]$\n",
    "\n",
    "$T _ { 2 } [ a ] = \\left[ \\begin{array} { l l } { 01 \\times \\operatorname { sbox } [ a ] } \\\\ { 03 \\times \\operatorname { sbox } [ a ] } \\\\ { 02 \\times \\operatorname { sbox } [ a ] } \\\\ { 01 \\times \\operatorname { sbox } [ a ] } \\end{array} \\right]$\n",
    "\n",
    "$T _ { 3 } [ a ] = \\left[ \\begin{array} { l l } { 01 \\times \\operatorname { sbox } [ a ] } \\\\ { 01 \\times \\operatorname { sbox } [ a ] } \\\\ { 03 \\times \\operatorname { sbox } [ a ] } \\\\ { 02 \\times \\operatorname { sbox } [ a ] } \\end{array} \\right]$\n",
    "\n",
    "we can see that for each T-Table lookup, the following is accessed:\n",
    "\n",
    "$\\operatorname {sbox}[a]$, $\\operatorname {sbox}[a]$, $2 \\times \\operatorname {sbox}[a]$, $3 \\times \\operatorname {sbox}[a]$\n",
    "\n",
    "so instead of just taking the Hamming weight of the SBox, we can instead take the Hamming weight of this whole access:\n",
    "\n",
    "$h = \\operatorname {hw}[\\operatorname {sbox}[a]] + \\operatorname {hw}[\\operatorname {sbox}[a]] + \\operatorname {hw}[2 \\times \\operatorname {sbox}[a]] + \\operatorname {hw}[3 \\times \\operatorname {sbox}[a]]$\n",
    "\n",
    "Again, ChipWhisperer already has this model built in, which you can access with `cwa.leakage_models.t_table`. Retry your CPA attack with this new leakage model:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import chipwhisperer.analyzer as cwa\n",
    "#pick right leakage model for your attack\n",
    "leak_model = cwa.leakage_models.t_table\n",
    "attack = cwa.cpa(project, leak_model)\n",
    "results = attack.run(cwa.get_jupyter_callback(attack))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Did this attack work better than the previous one?\n",
    "\n",
    "## T-Tables for Decryption:\n",
    "\n",
    "Recall that the last round of AES is different than the rest of the rounds. Instead of it applying `subbytes`, `shiftrows`, `mixcolumns`, and `addroundkey`, it leaves out `mixcolumns`. You might expect that this means that decryption doesn't use a reverse T-Table in the first decryption round, but this isn't necessarily the case! Since `mixcolumns` is a linear operation, $\\operatorname{mixcolumns}( \\operatorname{key} + \\operatorname{state})$ is equal to  $\\operatorname{mixcolumns}(\\operatorname{key}) + \\operatorname{mixcolumns}(\\operatorname{state})$. Again, this is the approach that MBEDTLS takes, so we would be able to use the reverse T-Table to attack decryption."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
